Gesture recognition

Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. Gestures can originate from any bodily motion or state but commonly originate from the face or hand. Current focuses in the field include emotion recognition from the face and hand gesture recognition. Many approaches have been made using cameras and computer vision algorithms to interpret sign language. However, the identification and recognition of posture, gait, proxemics, and human behaviors is also the subject of gesture recognition techniques.[1]

Gesture recognition can be seen as a way for computers to begin to understand human body language, this building a richer bridge between machines and humans than primitive text user interfaces or even GUIs (graphical user interfaces), which still limit the majority of input to keyboard and mouse.

Gesture recognition enables humans to interface with the machine (HMI) and interact naturally without any mechanical devices. Using the concept of gesture recognition, it is possible to point a finger at the computer screen so that the cursor will move accordingly. This could potentially make conventional input devices such as mouse, keyboards and even touch-screens redundant.

Gesture recognition can be conducted with techniques from computer vision and image processing.

The literature includes ongoing work in the computer vision field on capturing gestures or more general human pose and movements by cameras connected to a computer.[2][3][4][5]

Gesture recognition and pen computing:

Contents

Gesture types

In computer interfaces, two types of gestures are distinguished:[6]

Uses

Gesture recognition is useful for processing information from humans which is not conveyed through speech or type. As well, there are various types of gestures which can be identified by computers.

Input devices

The ability to track a person's movements and determine what gestures they may be performing can be achieved through various tools. Although there is a large amount of research done in image/video based gesture recognition, there is some variation within the tools and environments used between implementations.

Algorithms

Depending on the type of the input data, the approach for interpreting a gesture could be done in different ways. However, most of the techniques rely on key pointers represented in a 3D coordinate system. Based on the relative motion of these, the gesture can be detected with a high accuracy, depending of the quality of the input and the algorithm’s approach.
In order to interpret movements of the body, one has to classify them according to common properties and the message the movements may express. For example, in sign language each gesture represents a word or phrase. The taxonomy that seems very appropriate for Human-Computer Interaction has been proposed by Quek in “Toward a Vision-Based Hand Gesture Interface”.[29] He presents several interactive gesture systems in order to capture the whole space of the gestures: 1. Manipulative; 2. Semaphoric; 3. Conversational.

Some literature differentiates 2 different approaches in gesture recognition: a 3D model based and an appearance-based.[30] The foremost method makes use of 3D information of key elements of the body parts in order to obtain several important parameters, like palm position or joint angles. On the other hand, Appearance-based systems use images or videos for direct interpretation.

3D model-based algorithms

The 3D model approach can use volumetric or skeletal models, or even a combination of the two. Volumetric approaches have been heavily used in computer animation industry and for computer vision purposes. The models are generally created of complicated 3D surfaces, like NURBS or polygon meshes.

The drawback of this method is that is very computational intensive, and systems for live analysis are still to be developed. For the moment, a more interesting approach would be to map simple primitive objects to the person’s most important body parts ( for example cylinders for the arms and neck, sphere for the head) and analyse the way these interact with each other. Furthermore, some abstract structures like super-quadrics and generalised cylinders may be even more suitable for approximating the body parts. Very exciting about this approach is that the parameters for these objects are quite simple. In order to better model the relation between these, we make use of constraints and hierarchies between our objects.

Skeletal-based algorithms

Instead of using intensive processing of the 3D models and dealing with a lot of parameters, one can just use a simplified version of joint angle parameters along with segment lengths. This is known as a skeletal representation of the body, where a virtual skeleton of the person is computed and parts of the body are mapped to certain segments. The analysis here is done using the position and orientation of these segments and the relation between each one of them( for example the angle between the joints and the relative position or orientation)

Advantages of using skeletal models:

Appearance-based models

These models don’t use a spatial representation of the body anymore, because they derive the parameters directly from the images or videos using a template database. Some are based on the deformable 2D templates of the human parts of the body, particularly hands. Deformable templates are sets of points on the outline of an object, used as interpolation nodes for the object’s outline approximation. One of the simplest interpolation function is linear, which performs an average shape from point sets , point variability parameters and external deformators. These template-based models are mostly used for hand-tracking , but could also be of use for simple gesture classification.

A second approach in gesture detecting using appearance-based models uses image sequences as gesture templates. Parameters for this method are either the images themselves, or certain features derived from these. Most of the time, only one ( monoscopic) or two ( stereoscopic ) views are used.

Challenges

There are many challenges associated with the accuracy and usefulness of gesture recognition software. For image-based gesture recognition there are limitations on the equipment used and image noise. Images or video may not be under consistent lighting, or in the same location. Items in the background or distinct features of the users may make recognition more difficult.

The variety of implementations for image-based gesture recognition may also cause issue for viability of the technology to general usage. For example, an algorithm calibrated for one camera may not work for a different camera. The amount of background noise also causes tracking and recognition difficulties, especially when occlusions (partial and full) occur. Furthermore, the distance from the camera, and the camera's resolution and quality, also cause variations in recognition accuracy.

In order to capture human gestures by visual sensors, robust computer vision methods are also required, for example for hand tracking and hand posture recognition[31][32][33][34][35][36][37][38][39] or for capturing movements of the head, facial expressions or gaze direction.

"Gorilla arm"

"Gorilla arm" was a side-effect of vertically-oriented touch-screen or light-pen use. In periods of prolonged use, users' arms began to feel fatigue and/or discomfort. This effect contributed to the decline of touch-screen input despite initial popularity in the 1980s.[40][41]

Gorilla arm is not a problem for short-term use, since they only involve brief interactions which do not last long enough to cause gorilla arm.

See also

References

  1. ^ Matthias Rehm, Nikolaus Bee, Elisabeth André, Wave Like an Egyptian - Accelerometer Based Gesture Recognition for Culture Specific Interactions, British Computer Society, 2007
  2. ^ Pavlovic, V., Sharma, R. & Huang, T. (1997), "Visual interpretation of hand gestures for human-computer interaction: A review", IEEE Trans. Pattern Analysis and Machine Intelligence., July, 1997. Vol. 19(7), pp. 677 -695.
  3. ^ R. Cipolla and A. Pentland, Computer Vision for Human-Machine Interaction, Cambridge University Press, 1998, ISBN 978-0521622530
  4. ^ Ying Wu and Thomas S. Huang, "Vision-Based Gesture Recognition: A Review", In: Gesture-Based Communication in Human-Computer Interaction, Volume 1739 of Springer Lecture Notes in Computer Science, pages 103-115, 1999, ISBN 978-3-540-66935-7, doi 10.1007/3-540-46616-9
  5. ^ Alejandro Jaimesa and Nicu Sebe, Multimodal human–computer interaction: A survey, Computer Vision and Image Understanding Volume 108, Issues 1-2, October–November 2007, Pages 116-134 Special Issue on Vision for Human-Computer Interaction, doi:10.1016/j.cviu.2006.10.019
  6. ^ We consider online gestures, which can also be regarded as direct manipulations like scaling and rotating. In contrast, offline gestures are usually processed after the interaction is finished; e. g. a circle is drawn to activate a context menu.
  7. ^ Thad Starner, Alex Pentland, Visual Recognition of American Sign Language Using Hidden Markov Models, Massachusetts Institute of Technology
  8. ^ Kai Nickel, Rainer Stiefelhagen, Visual recognition of pointing gestures for human-robot interaction, Image and Vision Computing, vol 25, Issue 12, December 2007, pp 1875-1884
  9. ^ Lars Bretzner and Tony Lindeberg "Use Your Hand as a 3-D Mouse ...", Proc. 5th European Conference on Computer Vision (H. Burkhardt and B. Neumann, eds.), vol. 1406 of Lecture Notes in Computer Science, (Freiburg, Germany), pp. 141--157, Springer Verlag, Berlin, June 1998.
  10. ^ Matthew Turk and Mathias Kölsch, "Perceptual Interfaces", University of California, Santa Barbara UCSB Technical Report 2003-33
  11. ^ M Porta "Vision-based user interfaces: methods and applications", International Journal of Human-Computer Studies, 57:11, 27-73, 2002.
  12. ^ Afshin Sepehri, Yaser Yacoob, Larry S. Davis "Employing the Hand as an Interface Device", Journal of Multimedia, vol 1, number 2, pages 18-29
  13. ^ Henriksen, K. Sporring, J. Hornbaek, K. " Virtual trackballs revisited", IEEE Transactions on Visualization and Computer Graphics, Volume 10, Issue 2, paged 206-216, 2004
  14. ^ William Freeman, Craig Weissman, Television control by hand gestures, Mitsubishi Electric Research Lab, 1995
  15. ^ Do Jun-Hyeong, Jung Jin-Woo, Sung hoon Jung, Jang Hyoyoung, Bien Zeungnam, Advanced soft remote control system using hand gesture, Mexican International Conference on Artificial Intelligence, 2006
  16. ^ K. Ouchi, N. Esaka, Y. Tamura, M. Hirahara, M. Doi, Magic Wand: an intuitive gesture remote control for home appliances, International Conference on Active Media Technology, 2005 (AMT 2005), 2005
  17. ^ Lars Bretzner, Ivan Laptev, Tony Lindeberg, Sören Lenman, Yngve Sundblad "A Prototype System for Computer Vision Based Human Computer Interaction", Technical report CVAP251, ISRN KTH NA/P--01/09--SE. Department of Numerical Analysis and Computer Science, KTH (Royal Institute of Technology), SE-100 44 Stockholm, Sweden, April 23–25, 2001.
  18. ^ Wikipedia. “Wired glove – Wikipedia, the free encyclopedia.” Wikipedia, the free encyclopedia. N.p., n.d. Web. 20 Mar. 2011.
  19. ^ Thomas G. Zimmerman, Jaron Lanier, Chuck Blanchard, Steve Bryson and Young Harvill. http://portal.acm.org. “A HAND GESTURE INTERFACE DEVICE.” http://portal.acm.org.
  20. ^ Yang Liu, Yunde Jia, A Robust Hand Tracking and Gesture Recognition Method for Wearable Visual Interfaces and Its Applications, Proceedings of the Third International Conference on Image and Graphics (ICIG’04), 2004
  21. ^ Kue-Bum Lee, Jung-Hyun Kim, Kwang-Seok Hong, An Implementation of Multi-Modal Game Interface Based on PDAs, Fifth International Conference on Software Engineering Research, Management and Applications, 2007
  22. ^ Per Malmestig, Sofie Sundberg, SignWiiver - implementation of sign language technology
  23. ^ Thomas Schlomer, Benjamin Poppinga, Niels Henze, Susanne Boll, Gesture Recognition with a Wii Controller, Proceedings of the 2nd international Conference on Tangible and Embedded interaction, 2008
  24. ^ AiLive Inc., LiveMove White Paper, 2006
  25. ^ Electronic Design September 8, 2011. William Wong. Natural User Interface Employs Sensor Integration.
  26. ^ Cable & Satellite International September/October, 2011. Stephen Cousins. A view to a thrill.
  27. ^ TechJournal South January 7, 2008. Hillcrest Labs rings up $25M D round.
  28. ^ Wei Du, Hua Li, Vision based gesture recognition system with single camera, 5th International Conference on Signal Processing Proceedings, 2000
  29. ^ Quek, F., “Toward a vision-based hand gesture interface” Proceedings of the Virtual Reality System Technology Conference, pp. 17-29, August 23–26, 1994, Singapore
  30. ^ Vladimir I. Pavlovic, Rajeev Sharma, Thomas S. Huang, Visual Interpretation of Hand Gestures for Human-Computer Interaction; A Review, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997
  31. ^ Ivan Laptev and Tony Lindeberg "Tracking of Multi-state Hand Models Using Particle Filtering and a Hierarchy of Multi-scale Image Features", Proceedings Scale-Space and Morphology in Computer Vision, Volume 2106 of Springer Lecture Notes in Computer Science, pages 63-74, Vancouver, BC, 1999. ISBN 978-3-540-42317-1, doi 10.1007/3-540-47778-0
  32. ^ Christian von Hardenberg and François Bérard, "Bare-hand human-computer interaction", ACM International Conference Proceeding Series; Vol. 15 archive Proceedings of the 2001 workshop on Perceptive user interfaces, Orlando, Florida, Pages: 1 - 8, 2001
  33. ^ Lars Bretzner, Ivan Laptev, Tony Lindeberg "Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering", Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition, Washington, DC, USA, 21–21 May 2002, pages 423-428. ISBN 0-7695-1602-5, doi 10.1109/AFGR.2002.1004190
  34. ^ Domitilla Del Vecchio, Richard M. Murray Pietro Perona, "Decomposition of human motion into dynamics-based primitives with application to drawing tasks", Automatica Volume 39, Issue 12, December 2003, Pages 2085-2098 , doi:10.1016/S0005-1098(03)00250-4.
  35. ^ Thomas B. Moeslund and Lau Nørgaard, "A Brief Overview of Hand Gestures used in Wearable Human Computer Interfaces", Technical report: CVMT 03-02, ISSN: 1601-3646, Laboratory of Computer Vision and Media Technology, Aalborg University, Denmark.
  36. ^ M. Kolsch and M. Turk "Fast 2D Hand Tracking with Flocks of Features and Multi-Cue Integration", CVPRW '04. Proceedings Computer Vision and Pattern Recognition Workshop, May 27-June 2, 2004, doi 10.1109/CVPR.2004.71
  37. ^ Xia Liu Fujimura, K., "Hand gesture recognition using depth data", Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, May 17–19, 2004 pages 529- 534, ISBN 0-7695-2122-3, doi 10.1109/AFGR.2004.1301587.
  38. ^ Stenger B, Thayananthan A, Torr PH, Cipolla R: "Model-based hand tracking using a hierarchical Bayesian filter", IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9):1372-84, Sep 2006.
  39. ^ A Erol, G Bebis, M Nicolescu, RD Boyle, X Twombly, "Vision-based hand pose estimation: A review", Computer Vision and Image Understanding Volume 108, Issues 1-2, October–November 2007, Pages 52-73 Special Issue on Vision for Human-Computer Interaction, doi:10.1016/j.cviu.2006.10.012.
  40. ^ Windows 7? No arm in it - Mixed Signals - Rupert Goodwins's Blog at ZDNet.co.uk Community
  41. ^ The Jargon File - Gorilla Arm

External links